23 research outputs found

    Closed sets based discovery of small covers for association rules

    Get PDF
    International audienceIn this paper, we address the problem of the understandability and usefulness of the set of discovered association rules. This problem is important since real-life databases lead most of the time to several thousands of rules with high confidence. We thus propose new algorithms based on the Galois closed sets to limit the extraction to small informative covers for exact and approximate rules, and small structural covers for approximate rules. Once frequent closed itemsets - which constitute a generating set for both frequent itemsets and association rules - have been discovered, no additional database pass is needed to derive these covers. Experiments conducted on real-life databases show that these algorithms are efficient and valuable in practice

    Closed sets based discovery of small covers for association rules (extended version)

    Get PDF
    International audienceIn this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since real-life databases yield most of the time several thousands of rules with high conïŹdence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers (or bases) for exact and approximate rules, adapted from lattice theory and data analysis domain. Once frequent closed itemsets – which constitute a generating set for both frequent itemsets and association rules – have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on real-life databases show that these algorithms are efïŹcient and valuable in practice

    Mining bases for association rules using closed sets

    Get PDF
    International audienceAssociation rules are conditional implications between requent itemsets. The problem of the usefulness and the elevance of the set of discovered association rules is related to the huge number of rules extracted and the presence of many redundancies among these rules for many datasets. We address this important problem using the Galois connection framework and we show that we can generate bases or association rules using the frequent closed itemsets extracted by the Close or the A-Close algorithms

    Computing iceberg concept lattices with Titanic

    Get PDF
    International audienceWe introduce the notion of iceberg concept lattices and show their use in knowledge discovery in databases. Iceberg lattices are a conceptual clustering method, which is well suited for analyzing very large databases. They also serve as a condensed representation of frequent itemsets, as starting point for computing bases of association rules, and as a visualization method for association rules. Iceberg concept lattices are based on the theory of Formal Concept Analysis, a mathematical theory with applications in data analysis, information retrieval, and knowledge discovery. We present a new algorithm called TITANIC for computing (iceberg) concept lattices. It is based on data mining techniques with a level-wise approach. In fact, TITANIC can be used for a more general problem: Computing arbitrary closure systems when the closure operator comes along with a so-called weight function. The use of weight functions for computing closure systems has not been discussed in the literature up to now. Applications providing such a weight function include association rule mining, functional dependencies in databases, conceptual clustering, and ontology engineering. The algorithm is experimentally evaluated and compared with Ganter's Next-Closure algorithm. The evaluation shows an important gain in eïŹƒciency, especially for weakly correlated data

    Generating a condensed representation for association rules

    Get PDF
    International audienceAssociation rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the non-redundant association rules having minimal antecedent and maximal consequent, called min-max association rules. We think that these rules are the most relevant since they are the most general non-redundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as Apriori for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset

    Pascal : un algorithme d'extraction des motifs fréquents

    Get PDF
    International audienceNous proposons dans cet article l'algorithme Pascal qui introduit une nouvelle optimisation de l'algorithme de rĂ©fĂ©rence Apriori. Cette optimisation est fondĂ©e sur le comptage des motifs par infĂ©rence, qui utilise le concept de motifs clĂ©s. Le support des motifs frĂ©quents non clĂ©s peut ĂȘtre infĂ©rĂ© du support des motifs clĂ©s sans accĂšs Ă  la base de donnĂ©es. ExpĂ©rimentalement, la comparaison de Pascal avec Apriori, Close et Max-Miner montre son efficacitĂ©. Les motifs clĂ©s permettent aussi de dĂ©finir les rĂšgles d'association informatives, potentiellement plus utiles que l'ensemble complet des rĂšgles d'association et beaucoup moins nombreuses

    Levelwise search of frequent patterns with counting inference

    Get PDF
    Colloque avec actes et comité de lecture. nationale.National audienceIn this paper,we address the problem of the efficiency of the main phase of most data mining applications: The frequent pattern extraction. This problem is mainly related to the number of operations required for counting pattern supports in the database, and we propose a new method called pattern counting inference, that allows to perform as few support counts as possible. Using this method, the support of a pattern is determined without accessing the database whenever possible, using the supports of some of its sub-patterns called key patterns. This method was implemented in the Pascal algorithm that is an optimization of the simple and efficient Apriori Algorithm. Experiments comparing Pascal to the Apriori, Close and Max-Miner algorithms, each one representative of a frequent patterns discovery strategy, show that Pascal improves the efficiency of the frequent pattern extraction from correlated data and that it does not induce additional execution times when data is weakly correlated

    ALGORITHMIQUE DU TREILLIS DES FERMES (APPLICATION A L'ANALYSE FORMELLE DE CONCEPTS ET AUX BASES DE DONNEES)

    No full text
    UNE FERMETURE SUR UN TREILLIS EST UN OPERATEUR VERIFIANT LA MONOTONIE, L'EXTENSIVITE ET L'IDEMPOTENCE. LA REPRESENTATION D'UNE FERMETURE PAR SON IMAGE EST UN TREILLIS DIT TREILLIS DES FERMES. CE TREILLIS PEUT ETRE REPRESENTE PAR DES REGLES D'IMPLICATIONS ; CES REPRESENTATIONS SONT DITES BASES D'IMPLICATIONS. NOUS PROPOSONS DANS CE MEMOIRE DES ALGORITHMES DE GENERATION DU TREILLIS DES FERMES D'UNE FERMETURE SUR L'ENSEMBLE DES PARTIES. NOUS MONTRONS SUR DES CAS REELS L'INFLUENCE DES APPELS DE FERMETURE SUR LE TEMPS D'EXECUTION DES ALGORITHMES. NOUS PROPOSONS ENSUITE DES ALGORITHMES INCREMENTAUX QUI GENERENT LE TREILLIS DE CONCEPTS, LE COMPLETE DE DEDEKIND-MACNEILLE D'UN ORDRE ET LE TREILLIS DES ANTICHAINES MAXIMALES D'UN ORDRE. LES COMPLEXITES OBTENUES AMELIORENT CELLES DES ALGORITHMES ANTERIEURS. NOUS PROPOSONS EGALEMENT UN ALGORITHME (IMPEC) QUI CALCULE LA BASE CANONIQUE DES IMPLICATIONS POUR TOUTES FERMETURES SUR L'ENSEMBLE DES PARTIES. EN ANALYSE FORMELLE DE CONCEPTS, NOUS PROPOSONS LE TREILLIS DE CONCEPTS FORTS - CONSTITUE PAR LES CONCEPTS AYANT UN NOMBRE SUFFISANT (POUR L'UTILISATEUR) D'OBJETS - COMME OUTIL DE CLASSIFICATION. LA GENERATION DU TREILLIS DES CONCEPTS FORTS EST UN NOUVEAU PROBLEME, QUE NOUS FORMALISONS ; ET NOUS MONTRONS QUE LE TREILLIS DE CONCEPTS FORTS EST UN SUPPORT FORMEL POUR L'EXTRACTION DES REGLES D'ASSOCIATIONS. NOUS APPLIQUONS LES ALGORITHMES DE LA PREMIERE PARTIE DE CE MEMOIRE POUR SA GENERATION. EN BASES DE DONNEES, NOUS MONTRONS QUE L'ALGORITHME IMPEC PERMET DE RESOUDRE DE MANIERE GENERIQUE ET EFFICACE LES PROBLEMES DE PROJECTION DE DEPENDANCES FONCTIONNELLES, DE CONSTRUCTION DE LA RELATION D'ARMSTRONG ET D'INFERENCE DES DEPENDANCES FONCTIONNELLES. FERMETURES, TREILLIS, REGLES D'IMPLICATIONS, ALGORITHMES, ANALYSE FORMELLE DE CONCEPTS, CONCEPTION DES BASES DE DONNEES.CLERMONT FD-BCIU Sci.et Tech. (630142101) / SudocSudocFranceF

    Computing Proper Implications

    No full text
    Colloque avec actes et comité de lecture. internationale.International audienceThis paper presents the proper implications: all implications holding on a set with a minimal left-hand side and a one-item right-hand side. Although not the smallest representation, they are easily readable and allow for some efficient selection and projection (embedding) operations. The proposed algorithm, Impec, is designed to efficiently find proper implications given a set and a closure operator on this set. Additionally, it can be easily extended with a weight function or to compute embedded implications
    corecore